Semi-Automated Transcription Generation for Pashto Cursive Script

نویسندگان

  • Saeeda Naz
  • Riaz Ahmad
  • Muhammad Zeshan Afzal
  • Sheikh Faisal Rashid
چکیده

Usually, a large amount of transcription data is required for training and benchmarking Optical Character Recognition (OCR) systems for new scripts like Pashto. In case of real image data; mostly the images are acquired through scanning. For supervised training scenarios, it is required to have a ground truth against the corresponding scanned images. Usually, the ground truth is created by transcribing the documents manually, which is an overwhelmingly laborious phase. This work introduces a semi-automated procedure for transcribing Pashto document images using a long short term memory (LSTM) network architecture. The process is applied for the transcription of 1000 images having Pashto ligatures and it improves the transcription performance to around three times of manual method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Optical Recognition of Cursive Pashto Script Using Scale, Rotation and Location Invariant Approach

The presence of a large number of unique shapes called ligatures in cursive languages, along with variations due to scaling, orientation and location provides one of the most challenging pattern recognition problems. Recognition of the large number of ligatures is often a complicated task in oriental languages such as Pashto, Urdu, Persian and Arabic. Research on cursive script recognition ofte...

متن کامل

1 Invariant Handwriting Features Useful in Cursive - Script Recognition

The within-writer variability of handwriting forms one of the problems in the automatic recognition of cursive script. Variability can be handled by choosing handwriting features based upon the process of handwriting generation or upon computational models. Handwriting patterns are represented by a sequence of motor actions, i.e., "strokes", which can be identified by invariant segmentation. Ea...

متن کامل

1 Invariant Handwriting Features Useful in Cursive - Script Recognition Hans - Leo

The within-writer variability of handwriting forms one of the problems in the automatic recognition of cursive script. Variability can be handled by choosing handwriting features based upon the process of handwriting generation or upon computational models. Handwriting patterns are represented by a sequence of motor actions, i.e., "strokes", which can be identified by invariant segmentation. Ea...

متن کامل

Cursive Script Postal Address Recognition Abstract Cursive Script Postal Address Recognition

Cursive Script Postal Address Recognition By Prasun Sinha Large variations in writing styles and di culty in segmenting cursive words are the main reasons for cursive script postal address recognition being a challenging task A scheme for locating and recognizing words based on over segmentation followed by dynamic programming is proposed This technique is being used for zip code extraction as ...

متن کامل

Building a Perception Based Model for Reading Cursive Script

This paper presents a new perception based model for reading cursive script. We describe the organization of our pseudo-neuronal system and show the role of activation mechanism in perceiving and reading cursive script. We have introduced into our model some characteristics speciic to cursive script. First, we use more appropriate features such as ascenders and descenders. Second, we deal with ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016